1 . (Original) A method for determining dominant phrase vectors in a 
topological vector space for a semantic content of a document on a computer system, the 
method comprising: 

accessing dominant phrases for the document, the dominant phrases representing a 
condensed content for the document; 

constructing at least one state vector in the topological vector space for each dominant 
phrase using a dictionary and a basis; and 

collecting the state vectors into the dominant phrase vectors for the document. 

2. (Original) A method according to claim 1, wherein accessing dominant 
phrases includes extracting the dominant phrases from the document using a phrase extractor. 

3. (Original) A method according to claim 1, wherein accessing dominant 
phrases includes storing the dominant phrases in computer memory accessible by the 
computer system. 

4. (Original) A method according to claim 1, the method further comprising 
forming a semantic abstract comprising the dominant phrase vectors. 

5. (Original) A method for determining dominant vectors in a topological 
vector space for a semantic content of a document on a computer system, the method 
comprising: 

storing the document in computer memory accessible by the computer system; 
extracting words from at least a portion of the document; 

constructing a state vector in the topological vector space for each word using a 
dictionary and a basis; 

filtering the state vectors; and 

collecting the filtered state vectors into the dominant vectors for the document. 

6. (Original) A method according to claim 5, wherein extracting words 
includes extracting words from the entire document. 
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7. (Original) A method according to claim 5, wherein filtering the state 
vectors includes selecting the state vectors that occur with highest frequencies. 

8. (Original) A method according to claim 5, wherein filtering the state 
vectors includes: 

calculating a centroid in the topological vector space for the state vectors; and 
selecting the state vectors nearest the centroid. 

9. (Original) A method according to claim 5, the method further comprising 
forming a semantic abstract comprising the dominant vectors. 

10. (Original) A computer-readable medium containing a program to 
determine dominant vectors in a topological vector space for a semantic content of a 
document on a computer system, the program being executable on the computer system to 
implement the method of claim 5. 

1 1 . (Original) A method for determining a semantic abstract in a topological 
vector space for a semantic content of a document on a computer system, the method 
comprising: 

storing the document in computer memory accessible by the computer system; 
determining dominant phrase vectors for the document; 
determining dominant vectors for the document; and 

generating the semantic abstract using the dominant phrase vectors and the dominant 
vectors. 

12. (Original) A method according to claim 1 1 , wherein generating the 
semantic abstract includes reducing the dominant phrase vectors based on the dominant 
vectors. 

13. (Original) A method according to claim 11, wherein generating the 
semantic abstract includes reducing the dominant vectors based on the dominant phrase 
vectors. 
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14. (Original) A method according to claim 1 1, wherein generating the 
semantic abstract includes obtaining a probability distribution function for a reduced set of 
the dominant phrase vectors similar to a probability distribution function for the dominant 
phrase vectors. 

15. (Original) A method according to claim 11, the method further comprising 
identifying the lexemes or lexeme phrases corresponding to state vectors in the semantic 
abstract. 

16. (Original) A computer-readable medium containing a program to 
determine a semantic abstract in a topological vector space for a semantic content of a 
document on a computer system, the program being executable on the computer system to 
unplement the method of claim 1 1 . 

17. (Original) A method for comparing the semantic content of first and 
second documents on a computer system, the method comprising: 

determining semantic abstracts for the first and second documents; 
measuring a distance between the semantic abstracts; and 

classifying how closely related the first and second documents are using the distance. 

18. (Original) A method according to claim 1 7, wherein measuring a distance 
includes measuring a Hausdorff distance between the semantic abstracts. 

19. (Original) A method according to claim 17, wherein measuring a distance 
includes determining a centroid vector in the topological vector space for each semantic 
abstract. 

20. (Original) A method according to claim 19, wherein measuring a distance 
further includes measuring an angle between the centroid vectors. 

21 . (Original) A method according to claim 19, wherein measuring a distance 
further includes measuring a Euclidean distance between the centroid vectors. 
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22. (Original) A computer-readable medium containing a program to compare 
the semantic content of first and second documents on a computer system, the program being 
executable on the computer system to implement the method of claim 17. 

23. (Original) A method for locating a second document on a computer with a 
semantic content similar to a first document, the method comprising: 

determining a semantic abstract for the first document; 
locating a second document; 

determining a semantic abstract for the second document; 
measuring a distance between the semantic abstracts for the first and second 
documents; 

classifying how closely related the first and second documents are using the distance; 

and 

if the second document is classified as having a semantic content similar to the 
semantic content of the first document, selecting the second document. 

24. (Original) A method according to claim 23, the method further 
comprising, if the second document is classified as not having a semantic content similar to 
the semantic content of the first document, rejecting the second document. 

25. (Original) An apparatus on a computer system to determine a semantic 
abstract in a topological vector space for a semantic content of a document stored on the 
computer system, the apparatus comprising: 

a phrase extractor adapted to extract phrases from the document; 

a state vector constructor adapted to construct at least one state vector in the 
topological vector space for each phrase extracted by the phrase extractor; and 

collection means for collecting the state vectors into the semantic abstract for the 
document. 

26. (Original) An apparatus according to claim 25, the apparatus further 
comprising filter means for filtering the state vectors to reduce the size of the semantic 
abstract. 
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27. (Original) An apparatus according to claim 25, wherein the state vector 
constructor is further adapted to construct a state vector for each word in the document. 



28. (Original) An apparatus on a computer system to compare the semantic 
content of first and second documents on a computer system, the apparatus comprising: 

first and second semantic abstracts for the first and second documents, respectively, 
stored on the computer system and represented as sets of vectors in a topological vector 
space; 

measuring means for measuring the distance between the first and second semantic 
abstracts; and 

a classification scale to determine how closely related the first and second documents 
are based on the distance between the first and second semantic abstracts. 



29. (Original) A method for determining a semantic abstract in a topological 



vector space for a semantic content of a document on a computer system, the method 
comprising: 

extracting dominant phrases from the document using a phrase extractor, the 
dominant phrases representing a condensed content for the document; 

constructing at least one first state vector in the topological vector space for each 
dominant phrase using a dictionary and a basis; 

collecting the first state vectors into dominant phrase vectors for the document; 

extracting words from at least a portion of the document; 

constructing a second state vector in the topological vector space for each word using 
the dictionary and the basis; 

filtering the second state vectors; 

collecting the filtered second state vectors into dominant vectors for the document; 

and 

generating the semantic abstract using the dominant phrase vectors and the dominant 



30. (Original) A method according to claim 29, the method further comprising 
comparing the semantic abstract with a second semantic abstract for a second document to 
determine how closely related the contents of the documents are. 




vectors. 
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